[SPARK-57575][SQL] Support TIME type in to_char/to_varchar formatting by shrirangmhalgi · Pull Request #56637 · apache/spark

shrirangmhalgi · 2026-06-21T07:17:51Z

What changes were proposed in this pull request?

Extend DateFormatClass (used by to_char / to_varchar / date_format) to accept TimeType input. When the input is TimeType, the expression uses TimeFormatter to format the time-of-day value instead of TimestampFormatter. Date-only pattern fields (e.g., yyyy-MM-dd) are rejected at runtime with a clear error.

Why are the changes needed?

to_char and to_varchar accept DateType and TimestampType but not TimeType. With the TIME data type now supported in Spark, users need a way to format time values to strings using standard datetime patterns (e.g., to_char(time_col, 'HH:mm:ss')).

Does this PR introduce any user-facing change?

Yes. to_char(time_value, format) and to_varchar(time_value, format) now work for TimeType inputs, formatting time-of-day fields (HH, mm, ss, fractional seconds). Passing date-only patterns to a TIME input raises an error.

How was this patch tested?

Unit tests in DateExpressionsSuite (formatting, null handling, date-pattern rejection)
SQL golden file tests in datetime-formatting.sql verifying to_char and to_varchar with TIME literals
All existing DateFormat tests pass (no regressions)

Was this patch authored or co-authored using generative AI tooling?

Yes. Co-Authored using Claude Opus 4.6

shrirangmhalgi · 2026-06-21T07:19:25Z

@MaxGekk could you please review the PR. It is adding TimeType support to to_char / to_varchar using TimeFormatter, with date-pattern rejection and golden file tests.

MaxGekk

1 blocking, 1 non-blocking, 1 nit.
Solid feature with correct value handling and golden coverage. One blocking item: declaring TIME input as the concrete TimeType(6) silently restricts the feature to a single precision.

Design / architecture (1)

datetimeExpressions.scala:1142: TIME input declared as concrete TimeType(6) instead of AnyTimeType; non-6 precisions fail at analysis — see inline

Correctness (1)

DateExpressionsSuite.scala:348: date-pattern rejection raises a raw java.time exception (not a "clear error") and the test under-pins it — see inline

Nits: 1 minor item (see inline comments).

Verification

Traced the input-type coercion for a non-6 TIME. ToCharacterBuilder routes any DatetimeType to DateFormatClass, so TIME(3) reaches it. TypeCollection(TimestampType, TimeType(6)) does not accept TIME(3) by exact match, so ImplicitTypeCasts tries the members in order and coerces it to the first one, TimestampType (canANSIStoreAssign/TypeCoercion treat any DatetimeType->DatetimeType as castable). But TIME->TIMESTAMP has no case in canAnsiCast/canCast, so that inserted cast is invalid and the query fails at analysis. TIME(6) works only because it matches the member's exact precision; AnyTimeType accepts all precisions directly with no cast.

MaxGekk · 2026-06-21T07:59:55Z


  override def inputTypes: Seq[AbstractDataType] =
-    Seq(TimestampType, StringTypeWithCollation(supportsTrimCollation = true))
+    Seq(TypeCollection(TimestampType, TimeType(TimeType.DEFAULT_PRECISION)), StringTypeWithCollation(supportsTrimCollation = true))


This accepts only TIME(6). A TIME value of any other precision (TIME(3), current_time(3), a CAST(... AS TIME(0)) column) doesn't match by exact precision, so it gets implicitly coerced to TimestampType (the first collection member) — and TIME -> TIMESTAMP isn't a valid cast, so the query fails at analysis instead of formatting. Every other TIME-accepting expression (MinutesOfTime, HoursOfTime, TimeFromMicros) uses AnyTimeType, which accepts all precisions 0-6 directly.

Suggested change

Seq(TypeCollection(TimestampType, TimeType(TimeType.DEFAULT_PRECISION)), StringTypeWithCollation(supportsTrimCollation = true))

Seq(TypeCollection(TimestampType, AnyTimeType), StringTypeWithCollation(supportsTrimCollation = true))

Worth adding a regression test on a non-6 precision input — every current test and golden query uses a TIME'...' literal, which defaults to TIME(6), so this gap is untested today.

MaxGekk · 2026-06-21T07:59:55Z

+
+    // Date-only pattern fields should error for TIME input
+    val datePatternExpr = DateFormatClass(timeLit, Literal("yyyy-MM-dd"), UTC_OPT)
+    intercept[Exception] {


intercept[Exception] passes on any throwable and only exercises the interpreted eval, not codegen. The exception it catches is a raw java.time DateTimeException from LocalTime.format (the pattern's syntax is valid, so validatePatternString passes and the failure only surfaces at format time) — there's no Spark error class, so the PR description's "clear error" is overstated. Consider rejecting date-only patterns with a proper Spark error, then pin this test to that specific exception/message and cover the codegen path too.

MaxGekk · 2026-06-21T07:59:55Z


+  test("SPARK-57575: DateFormat with TimeType (to_char/to_varchar)") {
+    // 12:13:14 = (12*3600 + 13*60 + 14) seconds, stored as nanoseconds
+    val timeMicros = (12L * 3600 + 13 * 60 + 14) * 1000000000L


timeMicros holds a nanosecond value (the comment says "stored as nanoseconds", and TimeType is encoded in nanos-since-midnight). Rename to timeNanos to avoid misleading the next reader.

shrirangmhalgi · 2026-06-21T16:12:11Z

Thanks @MaxGekk for the review. All addressed in the latest commit:

Switched to AnyTimeType from TimeType. Good catch, TIME(3) / TIME(0) would have failed at analysis otherwise.
Pinned the test to DateTimeException and added a TIME(0) precision regression test
Renamed timeMicros → timeNanos

MaxGekk

2 addressed, 1 remaining, 0 new.
Prior blocking finding (declare TIME input as AnyTimeType) is fixed; variable renamed to timeNanos; TIME(0) regression test added. Value handling and golden coverage are correct.

Correctness (1)

datetimeExpressions.scala:1152 (follow-up on the prior thread, remaining/partial): date-only-pattern rejection still throws a raw java.time.DateTimeException (no Spark error class) and is asserted only on the interpreted path — fix proposal inline. Non-blocking.

Verification

Re-confirmed the prior blocking fix: AnyTimeType.acceptsType matches every TimeType precision, so TIME(0)/TIME(3) now bind directly with no implicit cast. Also checked the newly-reachable optimizer rule SimplifyDateTimeConversions: both rewrite cases require a GetTimestamp(...TimestampType...) child, which a TIME-typed DateFormatClass never is, so TIME is structurally gated out — no wrong rewrite.

MaxGekk · 2026-06-21T19:56:02Z

+      case _: TimeType =>
+        val tf = timeFormatterOption.getOrElse(
+          TimeFormatter(format.toString, isParsing = false))
+        UTF8String.fromString(tf.format(timestamp.asInstanceOf[Long]))


Follow-up on the prior thread (now that the exception type is pinned). A date-only pattern like yyyy-MM-dd is syntactically valid, so validatePatternString() passes at construction and this format call throws a raw java.time.UnsupportedTemporalTypeException at runtime — there's no Spark error class, so the PR description's "rejected at runtime with a clear error" is still overstated, and the test pins it only on the interpreted .eval path.

Centralizing the TIME format in one helper that both nullSafeEval and doGenCode call maps it to a proper error and covers codegen for free:

def formatTime(tf: TimeFormatter, nanos: Long, pattern: String): UTF8String = try UTF8String.fromString(tf.format(nanos)) catch { case e: DateTimeException => throw QueryExecutionErrors.invalidPatternError("to_char", pattern, e) }

QueryExecutionErrors.invalidPatternError already exists (INVALID_PARAMETER_VALUE.PATTERN) — no new error class needed. Then swap the test's interpreted-only intercept[...] { expr.eval(InternalRow(...)) } for checkErrorInExpression[SparkRuntimeException](expr, "INVALID_PARAMETER_VALUE.PATTERN", ...), which runs interpreted and codegen. If a Spark error is out of scope here, the minimal alternative is to drop "clear error" from the description and use checkExceptionInExpression[DateTimeException] (which still covers codegen). Non-blocking — fine to defer.

shrirangmhalgi · 2026-06-21T20:15:32Z

Thanks @MaxGekk. Wrapped the date-pattern rejection in QueryExecutionErrors.invalidPatternError via a shared DateFormatClass.formatTimeWithError helper that both eval and codegen call. The test now uses checkErrorInExpression[SparkRuntimeException] for full path coverage.

MaxGekk

1 addressed, 0 remaining, 3 new. (3 new = 2 newly introduced, 1 late catch.)
Prior round's date-pattern-rejection finding is fixed (Spark error on both eval paths). Three new: one inaccurate error message, two trivial cleanups.

Correctness (1)

datetimeExpressions.scala:1236: invalid-pattern error names a nonexistent regexp parameter and always reports to_char (also backs date_format/to_varchar) — see inline

Suggestions (2)

datetimeExpressions.scala:1169: unused errClass val in doGenCode — see inline
datetimeExpressions.scala:1229: Scaladoc says "Helper for codegen" but the helper is used by interpreted eval too — see inline

Verification

Re-confirmed the prior remaining fix: date-only patterns on TIME now raise INVALID_PARAMETER_VALUE.PATTERN on both interpreted and codegen paths (checkErrorInExpression). Also checked the newly-reachable SimplifyDateTimeConversions rewrite: with a TIME-typed DateFormatClass now bindable as case-1's inner node, the rewrite stays equivalent — time-only patterns round-trip, date-field patterns error identically on both sides, and the outer match still requires a GetTimestamp(TimestampType) child, so no wrong rewrite.

MaxGekk · 2026-06-22T08:58:45Z

+      UTF8String.fromString(tf.format(nanos))
+    } catch {
+      case e: java.time.DateTimeException =>
+        throw QueryExecutionErrors.invalidPatternError(funcName, pattern, e)


Following up on my own earlier suggestion to reuse this helper — having now seen the rendered message, it's misleading for these functions. invalidPatternError hardcodes parameter -> toSQLId("regexp") (it was written for regexp functions), so this produces The value of parameter(s) ``regexp`` in ``to_char`` is invalid: '<pattern>' — but to_char/to_varchar/date_format have no regexp parameter. And funcName is the literal "to_char" here and at the two codegen sites, yet DateFormatClass also backs date_format and to_varchar, so those are misreported.

Minimal fix is to pass prettyName instead of the literal. Fully dropping the regexp parameter would need a datetime-specific message rather than the regexp-oriented helper — your call whether that's worth it here, but the message is what users will see.

MaxGekk · 2026-06-22T08:58:45Z

-      })
+    left.dataType match {
+      case _: TimeType =>
+        val errClass = QueryExecutionErrors.getClass.getName.stripSuffix("$")


errClass is never used — the error wrapping now lives inside formatTimeWithError, so this is leftover. Remove:

Suggested change

val errClass = QueryExecutionErrors.getClass.getName.stripSuffix("$")

MaxGekk · 2026-06-22T08:58:45Z

 }

+object DateFormatClass {
+  /** Helper for codegen: formats time with proper Spark error on invalid pattern. */


This helper is called from both nullSafeEval (interpreted) and doGenCode, not codegen only:

Suggested change

/** Helper for codegen: formats time with proper Spark error on invalid pattern. */

/** Formats a TIME value, mapping an invalid pattern to a Spark error. Used by both eval and codegen. */

shrirangmhalgi · 2026-06-22T20:33:22Z

Thank you @MaxGekk. All concerns addressed in the latest commit:

prettyName instead of literal "to_char" - now reports the correct function name for date_format / to_varchar too
Removed dead errClass val
Fixed scaladoc to reflect both eval and codegen usage
Also replaced the regexp parameter with format by constructing the SparkRuntimeException directly - the error message now reads correctly for a datetime context

MaxGekk

3 addressed, 0 remaining, 1 new. (1 new = 0 newly introduced, 1 late catch.)
Prior round's three items (error-message parameter, dead errClass, scaladoc) are all addressed. One new — a late catch correcting my own earlier all-clear on SimplifyDateTimeConversions.

Correctness (1)

datetimeExpressions.scala:1234: TIME-input support newly activates SimplifyDateTimeConversions case 1, which drops a date_format→to_timestamp→date_format round-trip and silently changes the result for sub-microsecond TIME — see inline

Verification

Re-confirmed the prior fixes: invalid date/zone-field patterns now raise INVALID_PARAMETER_VALUE.PATTERN with parameter=format`` on both interpreted and codegen paths; pattern syntax errors are caught by TimeFormatter.validatePatternString() (Spark error), matching the timestamp path; legacy and non-legacy golden output are identical (TimeFormatter has no legacy mode). Then enumerated the `SimplifyDateTimeConversions` case-1 rewrite over the inner child's type: equivalent for `TimestampType`/`DateType`/`StringType` and for micro-aligned `TIME(≤6)` with time-only patterns; date/zone-field patterns error identically on both sides; but not equivalent for sub-microsecond `TIME(7/8/9)` with a sub-micro fractional pattern — the unoptimized plan truncates through the micros `TimestampType` intermediate while the optimized plan keeps full nanos. Case 2 is unaffected (its `DateFormatClass` always sits over a `GetTimestamp`/`TimestampType` child).

MaxGekk · 2026-06-22T21:17:30Z


  override def inputTypes: Seq[AbstractDataType] =
-    Seq(TimestampType, StringTypeWithCollation(supportsTrimCollation = true))
+    Seq(TypeCollection(TimestampType, AnyTimeType),


Following up on my own earlier verification of SimplifyDateTimeConversions — I cleared it in the last two rounds, but that was incomplete. Accepting TIME here newly activates case 1 of that rule (sql/catalyst/.../optimizer/expressions.scala:1208), which rewrites date_format(to_timestamp(date_format(X, p), p), p) → date_format(X, p). The rule keeps the inner date_format, and its child X (the _ in e @ DateFormatClass(_, pattern, timeZoneId)) is unconstrained — so with this change it can be TIME. My prior reasoning only checked the outer position (always TimestampType — correct), but the rewrite preserves the inner node.

The elimination is a no-op only when X is micro-aligned. For a sub-microsecond TIME(7/8/9) value with a sub-micro fractional pattern, the original truncates through the micros TimestampType intermediate while the rewrite keeps full nanos:

-- time_col : TIME(9) '12:13:14.123456789' select date_format(to_timestamp(date_format(time_col, 'HH:mm:ss.SSSSSSSSS'), 'HH:mm:ss.SSSSSSSSS'), 'HH:mm:ss.SSSSSSSSS'); -- unoptimized: 12:13:14.123456000 (truncated through micros) -- optimized: 12:13:14.123456789 (rule returns inner date_format(time_col, ...))

Reachable in default config (to_timestamp lowers to GetTimestamp(_, _, TimestampType, ...) after ReplaceExpressions; only spark.sql.timeType.enabled is needed). Minimal fix: guard case 1 to skip when the inner DateFormatClass's child is TimeType (case 2 is fine — its DateFormatClass always sits over a GetTimestamp/TimestampType child). Worth a regression test asserting optimized == unoptimized for a TIME(9) round-trip.

… TIME(0) test, rename variable

…VALUE.PATTERN)

…x scaladoc

shrirangmhalgi · 2026-06-22T21:48:09Z

Thanks @MaxGekk. Added the guard on SimplifyDateTimeConversions case 1 (!child.dataType.isInstanceOf[TimeType]) and a regression test with TIME(9) asserting the rule does not fire.

…micro precision safety)

MaxGekk reviewed Jun 21, 2026

View reviewed changes

shrirangmhalgi force-pushed the SPARK-57575-time-to-char branch from f12a132 to 948e69c Compare June 21, 2026 20:14

shrirangmhalgi force-pushed the SPARK-57575-time-to-char branch from 948e69c to 8617ef9 Compare June 22, 2026 05:35

MaxGekk reviewed Jun 22, 2026

View reviewed changes

shrirangmhalgi force-pushed the SPARK-57575-time-to-char branch from 8617ef9 to 91ca180 Compare June 22, 2026 20:31

MaxGekk reviewed Jun 22, 2026

View reviewed changes

shrirangmhalgi added 5 commits June 22, 2026 14:39

[SPARK-57575][SQL] Support TIME type in to_char/to_varchar formatting

a8512e6

Address MaxGekk's review: use AnyTimeType, pin DateTimeException, add…

eb1e1e7

… TIME(0) test, rename variable

Wrap date-pattern rejection in proper Spark error (INVALID_PARAMETER_…

26f5285

…VALUE.PATTERN)

Fix scalastyle: break long lines in codegen to stay under 100 chars

f42c3e0

Address MaxGekk r3: use prettyName in error, remove dead errClass, fi…

3b1b06b

…x scaladoc

shrirangmhalgi force-pushed the SPARK-57575-time-to-char branch from 91ca180 to 821ac2e Compare June 22, 2026 21:40

Guard SimplifyDateTimeConversions case 1 to skip TimeType child (sub-…

422a229

…micro precision safety)

shrirangmhalgi force-pushed the SPARK-57575-time-to-char branch from 821ac2e to 422a229 Compare June 23, 2026 00:18

	Seq(TypeCollection(TimestampType, TimeType(TimeType.DEFAULT_PRECISION)), StringTypeWithCollation(supportsTrimCollation = true))
	Seq(TypeCollection(TimestampType, AnyTimeType), StringTypeWithCollation(supportsTrimCollation = true))

	/** Helper for codegen: formats time with proper Spark error on invalid pattern. */
	/** Formats a TIME value, mapping an invalid pattern to a Spark error. Used by both eval and codegen. */

Conversation

shrirangmhalgi commented Jun 21, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

shrirangmhalgi commented Jun 21, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Design / architecture (1)

Correctness (1)

Verification

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shrirangmhalgi commented Jun 21, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Correctness (1)

Verification

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shrirangmhalgi commented Jun 21, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Correctness (1)

Suggestions (2)

Verification

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shrirangmhalgi commented Jun 22, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Correctness (1)

Verification

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shrirangmhalgi commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants